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Abstract 

We present a tail inequality for suprema of empirical processes generated by 
variables with finite ip a norms and apply it to some geometrically ergodic Markov 
chains to derive similar estimates for empirical processes of such chains, generated 
by bounded functions. We also obtain a bounded difference inequality for symmet- 
ric statistics of such Markov chains. 
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1 Introduction 



Let us consider a sequence X\ , X2 , • • • , X n of random variables with values in a mea- 
surable space (S,B) and a countable class of measurable functions /: S — > R. Define 
moreover the random variable 



In recent years a lot of effort has been devoted to describing the behaviour, in par- 
ticular concentration properties of the variable Z under various assumptions on the 
sequence X\, . . . , X n and the class T . Classically, one considers the case of i.i.d. or in- 
dependent random variables X^s and uniformly bounded classes of functions, although 
there are also results for unbounded functions or sequences of variables satisfying some 
mixing conditions. 

The aim of this paper is to present tail inequalities for the variable Z under two 
different types of assumptions, relaxing the classical conditions. 

In the first part of the article we consider the case of independent variables and 
unbounded functions (satisfying however some integrability assumptions). The main 
result of this part is Theorem [H presented in Section [2j 

In the second part we keep the assumption of uniform boundedness of the class 
J- but relax the condition on the underlying sequence of variables, by considering 
a class of Markov chains, satisfying classical small set conditions with exponentially 
integrable regeneration times. If the small set assumption is satisfied for the one step 
transition kernel, the regeneration technique for Markov chains together with the results 
for independent variables and unbounded functions allow us to derive tail inequalities 
for the variable Z (Theorem [7J presented in Section 13. 2 \) . 

In a more general situation, when the small set assumption is satisfied only by the 
m-skeleton chain, our results are restricted to sums of real variables, i.e. to the case of 
J- being a singleton (Theorem [6]) . 

Finally, in Section [331 using similar arguments, we derive a bounded difference type 
inequality for Markov chains, satisfying the same small set assumptions. 

We will start by describing known results for bounded classes of functions and 
independent random variables, beginning with the celebrated Talagrand's inequality. 
They will serve us both as tools and as a point of reference for presenting our results. 

1.1 Talagrand's concentration inequalities 

In the paper [21], Talagrand proved the following inequality for empirical processes. 

Theorem 1 (Talagrand, [23])' Let X\, . . . , X n be independent random variables with 
values in a measurable space (S, B) and let J- be a countable class of measurable func- 
tions f:S — * R, such that ||/||oo < a < 00 for every f 6 T . Consider the random 
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variable Z = supj e -p Y^=i f {■%■{). Then for all t > 0, 

F(Z>EZ + t)<Kexp^-~]og(l + y)), (1) 

where V = Ksvpf e -p Ya=i f(-^i) 2 an d K * s an absolute constant. In consequence, for 
all t>0, 

P( Z >E Z + i)<A' 1<!x p(- 7 L T7 ^) (2) 

for some universal constant K\ . Moreover, the above inequalities hold, when replacing 
Z by -Z. 

Inequalities [1] and [2] may be considered functional versions of respectively Bennett's 
and Bernstein's inequalities for sums of independent random variables and similarly 
as in the classical case, one of them implies the other. Let us note, that Bennett's 
inequality recovers both the subgaussian and Poisson behaviour of sums of indepen- 
dent random variables, corresponding to classical limit theorems, whereas Bernstein's 
inequality recovers the subgaussian behaviour for small values and exhibits exponential 
behaviour for larger values of t. 

The above inequalities proved to be a very important tool in infinite dimensional 
probability, machine learning and M-estimation. They drew considerable attention 
resulting in several simplified proofs and different versions. In particular, there has 
been a series of papers, starting from the work by Ledoux [TUj, exploring concentration 
of measure for empirical processes with the use of logarithmic Sobolev inequalities with 
discrete gradients. The first explicit constants were obtained by Massart [IB] , who 
proved in particular the following 

Theorem 2 (Massart, [E]). Let X\, . . . , X n be independent random variables with val- 
ues in a measurable space (S,B) and let J 7 be a countable class of measurable functions 
f : S — > R, such that ||/||oo < a < °o for every f G T. Consider the random variable 
Z = supj G jp | Yui=i f{Xi)\- Assume moreover that for all f G T and all i, E/(Xj) = 
and let a 2 = supj g:F Y17=i ^f(Xi) 2 - Then for all n > and t > 0, 

P(Z> (1 + n)EZ + ay / 2K 1 t + K 2 {rj)at) < e~* 

and 

W(Z < (1 - n)EZ - a^2K z t - K 4 (r))at) < e~\ 
where K x = 4, K 2 {r\) = 2.5 + 32/77, K 3 = 5.4, #4(17) = 2.5 + 43.2/r?. 

Similar, more refined results were obtained subsequently by Bousquet [2] and Klein 
and Rio [7] . The latter article contains an inequality for suprema of empirical processes 
with the best known constants. 
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Theorem 3 (Klein, Rio, [7], Theorems 1.1., 1.2). Let X\, X%, . . . , X n be independent 
random variables with values in a measurable space (S, B) and let J- be a countable class 
of measurable functions f : S — ► [—a, a], such that for all i, E/pQ) = 0. Consider the 
random variable 

n 

Z = sup^/pQ). 

Then, for all t > 0, 

¥(Z > EZ + t) < exp ( — n r ) 

v ~ '~ y \ 2{a 2 + 2aEZ) + 3atJ 

and 

P( Z <E Z - t )<exp(- 2(a2 + 2 ^ z) + 3 J , 

where 

n 

cT 2 = supVE/pQ) 2 . 

The reader may notice, that contrary to the original Talagrand's result, estimates 
of Theorem [2] and [3] use rather the 'weak' variance a 2 than the 'strong' parameter 
V of Theorem [TJ This stems from several reasons, e.g. the statistical relevance of 
parameter a and analogy with the concentration of Gaussian processes (which by CLT, 
in the case of Donsker classes of functions correspond to the limiting behaviour of 
empirical processes). One should also note, that by the contraction principle we have 
a 2 < V < a 2 + 16aEZ (see [12], Lemma 6.6). Thus, usually one would like to describe 
the subgaussian behaviour of the variables Z rather in terms of a, however the price 
to be paid is the additional summand of the form rjEZ. Let us also remark, that if one 
does not pay attention to constants, inequalities presented in Theorems [2] and [3] follow 
from Talagrand's inequality just by the aforementioned estimate V < a 2 + 16aEZ and 
the inequality between the geometric and the arithmetic mean (in the case of Theorem 

ED- 

1.2 Notation, basic definitions 

In the article, by K we will denote universal constants and by C(a,(3),K a - constants 
depending only on a, (3 or only on a resp. (where a, (3 are some parameters). In both 
cases the values of constants may change from line to line. 

We will also use the classical definition of (exponential) Orlicz norms. 

Definition 1. For a > 0, define the function ip a : R + — > M + with the formula ip a ( x ) = 
exp(x Q ) — 1. For a random variable X, define also the Orlicz norm 

\\X\\^ a = inf{A > 0: mi) a (\X\/\) < 1}. 
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Let us also note a basic fact that we will use in the sequel, namely that by Cheby- 
shev's inequality, for t > 0, 

P(|X|>t)<2exp(-(— * )°). 

II \\lpa 

Remark For a < 1 the above definition does not give a norm but only a quasi-norm. 
It can be fixed by changing the function ip a near zero, to make it convex (which would 
give an equivalent norm). It is however widely accepted in literature to use the word 
norm also for the quasi-norm given by our definition. 



2 Tail inequality for suprema of empirical processes cor- 
responding to classes of unbounded functions 

2.1 The main result for the independent case 

We will now formulate our main result in the setting of independent variables, namely 
tail estimates for suprema of empirical processes under the assumption that the sum- 
mands have finite i/j a Orlicz norm. 

Theorem 4. Let X\,... ,X n be independent random variables with values in a mea- 
surable space (S,B) and let T be a countable class of measurable functions f: S — > R. 
Assume that for every f G T and every i, E/pQ) = and for some a G (0, 1] and all 
h II supj 1/(^)111^ < oo. Let 

n 



Define moreover 



Z = S n V \Y j f(X i 



<T 2 = supVE/pQ) 2 . 



Then, for all < rj < 1 and 5 > 0, there exists a constant C = C(a, r/, 5), such that 
for all t > 0, 



¥(Z > (l + n)EZ + t) 

" GXP ( " 2(1 + 5)a*) + 3ex P ( " (c\\ max,su P/ ^ |/(^)|||^ ) ) 



and 



F(Z < (1 - r/)EZ - t) 

- ex P ( ~ on 7 x\ 2 ) + 3ex P ( - (77TI " — VTrsF 

V 2(1+0)(T 2 / V maxj supj gjr 
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Remark The above theorem may be thus considered a counterpart of Massart's result 
(Theorem [2]). It is written in a slightly different manner, reflecting the use of Theorem 
[3] in the proof, but it is easy to see that if one disregards the constants, it yields another 
version in flavour of inequalities presented in Theorem [2j 

Let us note that some weaker (e.g. not recovering the proper power a in the 
subexponential decay of the tail) inequalities may be obtained by combining the Pisier 
inequality (see (|13p below) with moment estimates for empirical processes proven by 
Gine, Latala and Zinn [5] and later obtained by a different method also by Bousquet, 
Boucheron, Lugosi and Massart [3]. These moment estimates first appeared in the 
context of tail inequalities for [/-statistics and were later used in statistics, in model 
selection. They are however also of independent interest as extensions of classical 
Rosenthal's inequalities for p-th moments of sums of independent random variables 
(with the dependence on p stated explicitly). 



The proof of Theorem0]is a compilation of the classical Hoffman- J0rgensen inequality 
with Theorem [3] and another deep result due to Talagrand. 

Theorem 5 (Ledoux, Talagrand, [12], Theorem 6.21. p. 172). In the setting of Theo- 
rem^ we have 



\Z\U a <K a [\\Z\\ 1 + 



max sup \f(Xi 
1 f 



We will also need the following corollary to Theorem [3j which was derived in [1]. 
Since the proof is very short we will present it here for the sake of completeness 

Lemma 1. In the setting of Theorem^ for allO < rj < 1, 5 > there exists a constant 
C = C(n,5), such that for all t > 0, 



exp l I 



HZ > (1 + „)EZ + i) < exp ( - w ^-,) + _ v Ca) 
and 

t 2 \ f t \ 



n z< ( l-,)EZ-t)<e, P (- WT ^) +e , Py CaJ 

Proof. It is enough to notice that for all 5 > 0, 

t 2 \ / t 2 



exp - — < exp 



2(o- 2 + 2aEZ) + 3atJ ~ *V 2(l + 5)o- 2 

+ exp 



t 2 



1 + 5- 1 )(4aEZ + 3ta) 



and use this inequality together with Theorem [3] for t + rjKZ instead of t, which gives 
C= (l + l/5)(3 + 2 ?7 - 1 ). □ 
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Proof of Theorem Without loss of generality we may and will assume that 



t/\\ max sup |/pQ)||k > K(a, V ,S), (3) 

l<?<n/ e jF 

otherwise we can make the theorem trivial by choosing the constant C = C(r), S, a) to 
be large enough. The conditions on the constant K{a, r/, 5) will be imposed later on in 
the proof. 

Let e = e(5) > (its value will be determined later) and for all / G T consider 
the truncated functions f\{x) = /(^)l{ S up /g;r |/0)l<p} (^ ne truncation level p will also 
be fixed later). Define also functions f2(x) = f{x) — f\{x) = f(x)l{ suPfe:F |/(a;)|>p}- Let 
r, {/,: /( T). i 1.2. 

We have 

n n 

Z = sup | V f(Xi)\ < sup | T(fi(Xi) - Eh(Xi))\ 
fer ^ her, ^ 

rt 

+ sup I V"(/ 2 (Xj) - E/ 2 (Xj))| (4) 

and 

n 

Z> sup | V7/i(Xj) - E/i(Xj))| 

n 

- sup |^(/ 2 (Xi)-E/ 2 (Xi))| (5) 

where we used the fact that E/ipQ) + E/ 2 (X;) = for all / G :F. 
Similarly, by Jensen's inequality, we get 

n n 

E sup |^(/i(X i )-E/ 1 (X i ))|-2E sup |^/ 2 pQ)| < EZ 

n n 

< E sup | V(/ a (Xi) -Eh(Xi))\ + 2E sup | V/ 2 (Xi)l- (6) 



Denoting 



and 



A = E sup | V7/i(Xj) -E/i(Xj))| 



S = E sup | V/ 2 pQ 
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we get by dU and ©, 

P(Z > (l+rj)EZ + t) 

n 

<P( sup | - E/xpQ))) > (1 + r/)EZ + (1 - e)i) 

fieri t^i 

n 

+ P( sup | V(/ 2 pQ) - E/ 2 (Xi))| > et) 
her 2 ^ 

n 

<P( sup | - E/i(Xi))| > (1 + ^ - 45 + (1 - e)t) 

hen t^i 

n 

+ P( sup I V(/ 2 (*i) - E/ 2 (Xj))| > et) (7) 

and similarly by (|5|) and ©, 

F(Z < (l-i])EZ -t) 

n 

<P( sup | - E/i(XO)| < (1 - r/)EZ - (1 - e)t) 

+ P( sup | ^(/ 2 pQ) - E/ 2 (Xj))| > et) 

n 

<P( sup | YY/ipQ) - E/i(Xi))| < (1 - ??)^ - (1 - e)t + 25) 

n 

+ P( sup I ^2(f 2 (Xi) - Ef 2 (Xi))\ > et). (8) 
heT % i=1 

We would like to choose a truncation level p in a way, which would allow to bound 
the first summands on the right-hand sides of ([7|) and ([8]) with Lemma Q] and the other 
summands with Theorem [5J 

To this end let us set 



p = 8E max sup |/(Xj)| < K a max sup \f(X. 



Ki<n 



feT 



Ki<n 



feT 



(9) 



Let us notice that by the Chebyshev inequality and the definition of the class J~2, we 
have 



k 
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and thus by the Hoffmann-j0rgensen inequality (see e.g. [12j . Chapter 6, Proposition 
6.8., inequality (6.8)), we obtain 



B = E 



ft 

sup | V/ 2 PQ)| < 8E max sup |/(X 4 )|. 



(10) 



In consequence 



E sup | V(/ 2 PQ) -E/ 2 (Xi))| < 16E max sup 



max sup \f(Xi 

l<i<n j eJi r 



We also have 



max sup \f 2 (Xi) -Ef 2 (Xi 
i<i<n /2€J c 2 



< K, 



max sup \f 2 (Xi 

max sup |/ 2 pQ 
l<i<n/2&F 2 

max sup |/(Xj 

l<i<nf £ jr 



+ K a 



E max sup |/ 2 (-Xj 



(recall that with our definitions, for a < 1, || • |L is a quasi-norm, which explains the 
presence of the constant K a in the first inequality). Thus, by Theorem we obtain 



sup I V7/ 2 pQ) - E/ 2 (Xj))| 



■0a 



max sup \f(Xi 

l<i<nf(zyr 



which implies 



(sup \J2f2(X i )-Ef 2 (X i )\ >et) 



< 2 exp 



,Kq,|| maxi<j< n sup /eF |/(^i)|||^ a 
Let us now choose e < 1/10 and such that 

(l-5e)- 2 (l + <5/2) < (1 + 5). 



(11) 



(12) 



Since e is a function of 5, in view of ([9]) and (I10p . we can choose the constant 
if (a, 77, <5) in © to be large enough, to assure that 

B < 8E max sup < et. 



Ki<n 



ft? 
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Notice moreover, that for every / e T, we have E(/ipQ) -E/ipQ)) 2 < E/i(Xi) 2 < 

Thus, using inequalities (J7]), ([8]), (fTT|) and Lemma Q] (applied for 7] and (5/2), we 
obtain 

P(Z > (1 + r/)EZ + t), P(Z < (1 - t/)EZ - t) 
i 2 (l-5e) 2 \ / (l-5e)i 



- eXP l-2(l + 5/2)a 2 J +eXP V 
+ 2 exp 



Kq-H maxi<i< n sup /e ^- |/(Xj)|||^, Q 



Since e < 1/10, using ([9j) one can see that for t satisfying ([3]) with K(a,rj,6) large 
enough, we have 

/ (1 - 5e)t\ ( ( et 
exp — — — — , exp ' 



K(rj,8)p/ V \K a \\ maxi<j<„sup /gJ r \f(Xi)\\\^ o 

< exp ( — (— 

V VC(a,?7,<5)||maxi<i< n sup /gJ r \f{Xi)\\\^ o 

(note that the above inequality holds for all t if a = 1). 
Therefore, for such t, 

¥(Z > (1 +r])EZ + t), P(Z < (1 - rj)EZ - t) 

, * 2 (1 " 5e) 2 

< exp ' 



2(1 + 5/2)<t 2 

+ 3 exp 



- C(a, T), 5)\\ maxi<i<„ sup /g ^ \f{Xi)\ \\^ o 
To finish the proof it is now enough to use (fT12j) . 



□ 



Remark We would like to point out that the use of the Hoffman-j0rgensen inequal- 
ity in similar context is well known. Such applications appeared in the proof of the 
aforementioned moment estimates for empirical processes by Gine, Latala, Zinn [5], 
in the proof of Theorem [5] and recently in the proof of Fuk-Nagaev type inequalities 
for empirical processes used by Einmahl and Li to investigate generalized laws of the 
iterated logarithm for Banach space valued variables [3]. 

As for using Theorem [5] to control the remainder after truncating the original ran- 
dom variables, it was recently used in a somewhat similar way by Mendelson and 
Tomczak-Jaegermann (see |19j). 



2.2 A counterexample 

We will now present a simple example, showing that in Theorem 0] one cannot replace 
|| supy maxj l/C-XtJIII^a with maxj || sup* |/(-3Q)|||^ a . With such a modification, the in- 
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equality fails to be true even in the real valued case, i.e. when J- is a singleton. For 
simplicity we will consider only the case a = 1. 

Consider a sequence Yi,Y2, ■ ■ ■ , of i.i.d. real random variables, such that ¥(Y{ = 
r) = e~ r = 1 — F(Yi = 0). Let si, e%, . . . , be a Rademacher sequence, independent from 
(Yi)i. Define finally Xi = £jli. We have 

E e l x °l = e r e~ r + (1 - e" r ) < 2, 

so H-Xill^ < 1. Moreover 

E|Xi| 2 = r 2 e~ r . 
Assume now that we have for all n, r € N and t > 0, 

n 

P (|X^| > K (^\\ x ih + t\\XihS) <Ke~\ 
i=l 

where K is an absolute constant (which would hold if the corresponding version of 
Theorem [J] was true). 

For sufficiently large r, the above inequality applied with n ~ e r r~ 2 and t ~ r, 
implies that 



i=i 

On the other hand, by Levy's inequality, we have 



2P( VlJ > r) > P(max|Xj| > r) > -min(nP(|Xi| > r),l) > -r~ 2 , 

\l ^ — ' ! / i<n 2 2 

i=l 

which gives a contradiction for large r. 

Remark A small modification of the above argument shows that one cannot hope 
for an inequality 



Z > K(EZ + Via + t\logP n] max || sup \f(X { )\ ||^ J < Ke~ l 
with (3 < 1. For /3 = 1, this inequality follows from Theorem 2] via Pisier's inequality 

eh, 



max | Yi | 

i<n 



<K a max\\Y\\ lPa log 1 / a n (13) 



for independent real variables Y\, . . . , y n . 
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3 Applications to Markov chains 



We will now turn to the other class of inequalities we are interested in. We are again 
concerned with random variables of the form 

n 

Z = sup| J2f(Xi)\, 

but this time we assume that the class T is uniformly bounded and we drop the assump- 
tion on the independence of the underlying sequence X±, . . . , X n . To be more precise, 
we will assume that X\, . . . ,X n form a Markov chain, satisfying some additional con- 
ditions, which are rather classical in the Markov chain or Markov Chain Monte Carlo 
literature. 

The organization of this part is as follows. First, before stating the main results, we 
will present all the structural assumptions we will impose on the chain. At the same 
time we will introduce some notation, which will be used in the sequel. Next, we present 
our results (Theorems [6] and [7]) followed by the proof (which is quite straightforward 
but technical) and a discussion of the optimality (Section |3.3|) . At the end, in Section 
13.41 we will also present a bounded differences type inequality for Markov chains. 

3.1 Assumptions on the Markov chain 

Let X\, X2, ■ ■ ■ be a homogeneous Markov chain on S, with transition kernel P = 
P(x,A), satisfying the so called minorization condition, stated below. 

Minorization condition We assume that there exist positive m £ N, 5 > 0, a set 
C £ B (,, small set") and a probability measure v on S for which 

V xe cVAeBP m (x,A) >5u{A) (14) 

and 

y xeS 3 n P nm (x,C) >0, (15) 

where P l (-, ■) is the transition kernel for the chain after i steps. 

One can show that in such a situation if the chain admits an invariant measure tt, 
then this measure is unique and satisfies n(C) > (see [171). Moreover, under some 
conditions on the initial distribution £, it can be extended to a new (so called split) 
chain (X n ,R n ) G S x {0, 1}, satisfying the following properties. 

Properties of the split chain 

(PI) {X n ) n is again a Markov chain with transition kernel P and initial distribution 
£ (hence for our purposes of estimating the tail probabilities we may and will 
identify X n and X n ), 
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(P2) if we define T x = inf{n > 0: R nm = 1}, 

T i+1 = inf{n > 0: R( Tl +...+T,+n)m = 1}, 
then Ti, T2, . . . , are well defined, independent, moreover T2, T3, . . . are i.i.d., 
(P3) if we define Si = T\ + . . . + T$, then the ,, blocks" 

Yo = (-X"l , • • • ; X m T x +m— 1 J j 

it = (^m(5i+l)) • • • j-^mSi+i+m-l)) 2 > 0, 

form a one-dependent sequence (i.e. for all i, <r((^)j<i) and a({Yj)j > i) are in- 
dependent). Moreover, the sequence Y%, Y%, . . . is stationary. If m = 1, then the 
variables Yo> Y\,... are independent. 

In consequence, for / : S — > R, the variables 

m5i+i+m— 1 

Z< = Zi(f) = Yl f( x ^ i ^ ^ 

i=m{Si+l) 

constitute a one-dependent stationary sequence (an i.i.d. sequence if m = 1). 
Additionally, if / is 7r-integrable (recall that 7r is the unique stationary measure 
for the chain), then 

EZ l = 8~ l -K{Cy l m j fdn. (16) 

(P4) the distribution of T% depends only on £, P, C, 5, u, whereas the law of T2 only on 
P, C, 5 and v. 

We refrain from specifying the construction of this new chain in full generality as 
well as conditions under which (I14p and (I15p hold and refer the reader to the classical 
monograph [T7] or a survey article [22] for a complete exposition. Here, we will only 
sketch the construction for m = 1, to give its ,, flavour". Informally speaking, at each 
step i, if we have Xi = x and x ^ C, we generate the next value of the chain, according 
to the measure P(x, ■). If x G C, then we toss a coin with probability of success equal 
to 5. In the case of success (Ri = 1), we draw the next sample according to the measure 
v, otherwise (Ri = 0), according to 

p(x,-)-h-) 
1-5 

When Ri = 1, one usually says that the chain regenerates, as the distribution in the 
next step (for m = 1, after m steps in general) is again v. 

Let us remark that for a recurrent chain on a countable state space, admitting a 
stationary distribution, the Minorization condition is always satisfied with m = 1 and 
S = 1 (for C we can take {x}, where x is an arbitrary element of the state space). Also 
the construction of the split chain becomes trivial. 
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Before we proceed, let us present a general idea, our approach is based on. To derive 
our estimates we will need two types of assumptions. 

Regeneration assumption. We will work under the assumption that the chain 
admits a representation as above (Properties (PI) to (P4)). We will not however take 
advantage of the explicit construction. Instead we will use the properties stated in 
points above. A similar approach is quite common in the literature. 

Assumption of the exponential integrability of the regeneration time. To 

derive concentration of measure inequalities, we will also assume that ||Ti||^, 1 < oo 
and H^H^i < 00 • At the end of the article we will present examples for which this 
assumption is satisfied and relate obtained inequalities to known results. 

The regenerative properties of the chain allow us to decompose the chain into one- 
dependent (independent if m = 1) blocks of random length, making it possible to reduce 
the analysis of the chain to sums of independent random variables (this approach is by 
now classical, it has been successfully used in the analysis of limit theorems for Markov 
chains, see |17|). Since we are interested in non- asymptotic estimates of exponential 
type, we have to impose some additional conditions on the regeneration time, which 
would give us control over the random length of one-dependent blocks. This is the 
reason for introducing the assumption of the exponential integrability which (after some 
technical steps) allows us to apply the inequalities for unbounded empirical processes, 
presented in Section [2J 

3.2 Main results concerning Markov chains 

Having established all the notation, we are ready to state our main results on Markov 
chains. 

As announced in the introduction, our results depend on the parameter m in the 
Minorization condition. If m = 1 we are able to obtain tail inequalities for empirical 
processes (Theorem [7|) , whereas for m > 1 we have to restrict to linear statistics of 
Markov chains (Theorem [6]), which formally corresponds to empirical processes indexed 
by a singleton. The variables Tj and Z% appearing in the theorems were defined in the 
previous section (see the properties (P2) and (P3) of the split chain). 

Theorem 6. Let X\,X2, ■ ■ ■ be a Markov chain with values in S, satisfying the Mi- 
norization condition and admitting a (unique) stationary distribution it. Assume 
also that \\Ti ||Cf2 [| — T • Consider a function f : S — > R , such that ||/||oo < a an d 
Ejr/ = 0. Define also the random variable 

n 

z = Y J f{x i ). 

1=1 
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Then for all t > 0, 



1 / t 2 t 



Z\>t <ifexp min — , -= . (17) 

1 /- V K Vn(mEr 2 )- 1 VarZi' T 2 am\ogn) J y 1 

Theorem 7. Let Xi,X2, ... be a Markov chain with values in S, satisfying the Mi- 
norization condition with m = 1 and admitting a (unique) stationary distribution 
ir. Assume also that ||Ti , I^H^ < r. Consider moreover a countable class T of 
measurable functions f: S — ► R, such that ||/||oo < a an d ^nf = 0. Define the random 
variable 

n 

Z = sup I V f(X t ) 

and the "asymptotic weak variance" 

a 2 = supVarZi(/)/ET 2 . 

Then, for all t > 1, 

Z > i^EZ + t) < Kexp ( - 1 min (-^, = )) 



Remarks 



1. As it was mentioned in the previous section, chains satisfying the Minorization 
condition admit at most one stationary measure. 

2. In Theorem [3 the dependence on the chain is worse that in Theorem [61 i.e. we 
have t 3 (ET2)~ 1 instead of r 2 in the denominator. It is a result of just one step 
in the argument we present below, however at the moment we do not know how 
to improve this dependence (or extend the result to m > 1). 

3. Another remark we would like to make is related to the limit behaviour of the 
Markov chain. Let us notice that the asymptotic variance (the variance in the 
CLT) for n- 1 /2(/(x 1 ) + . . . + f(X n )) equals 

m-^ETaJ-^VaxZi +^Z X Z 2 ), 

which for m = 1 reduces to 



(ET 2 ) _1 VarZ; 



(we again refer the reader to [T7], Chapter 17 for details). Thus, for m = 1 our 
estimates reflect the asymptotic behaviour of the variable Z. 



15 



Let us now pass to the proofs of the above theorems. For a function / : S — * M. let 
us define 

(mTi+m-l)An 

and recall the variables St and 

mSi+i+'/Ti— 1 

Zi = Zt(f) = Yl /M> 1 ^ !> 
i=m(5i+l) 

defined in the previous section (see property (P3) of the split chain). Recall also that 
Zj's form a one-dependent stationary sequence for m > 1 and an i.i.d. sequence for 
m = 1. 

Using this notation, we have 

n 

/(X 1 ) + ... + /(X n )=Z + ... + Z Ar + Yl f( X ^> ( 18 ) 

i=(5jv+i+l)m 

with 

N = sup{i G N: mS i+ i + m - 1 < n}, (19) 

where sup0 = (note that TV is a random variable). Thus Zq represents the sum 
up to the first regeneration time, then Z\, . . . , Zn are identically distributed blocks 
between consecutive regeneration times, included in the interval [l,n], finally the last 
term corresponds to the initial segment of the last block. The sum Z\ + . . . + Zjq is 
empty if up to time n, there has not been any regeneration (i.e. mT\ + m — 1 > n) or 
there has been only one regeneration (mTi + m — 1 < n and m(T\ + T2) + m — 1 > n). 
The last sum on the right hand side is empty if there has been no regeneration or the 
last 'full' block ends with n. 

We will first bound the initial and the last summand in the decomposition (|18l) . To 
achieve this we will not need the assumptions that / is centered with respect to the 
stationary distribution ir. In consequence the same bound may be applied to proofs of 
both Theorem [6] and Theorem [71 

Lemma 2. If ||Ti||^, 1 < r and ||/||oo < a , then for all t > 0, 

PflZol >0<2exp(^-). 

Proof. We have \Zq\ < 2aT\m, so by the remark after Definition [U 

H\Zq\ >t)< P(Ti > t/2am) < 2exp f-^ — V 

\2amT J 

□ 
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The next lemma provides a similar bound for the last summand on the right hand 
side of (fTBj) . It is a little bit more complicated, since it involves additional dependence 
on the random variable N. 



Lemma 3. // ||Ti , I^H^ < r, then for all t>0, 

P(n - m(S N+1 + 1) + 1 > t) < if exp ( 
In consequence, if \\f\\oo < a, then 



-t 



Kmr log r 



E /(**) 

i=(Sjv+i+l)m 



> t < if exp 



KamT log r 



Proof. Let us consider the variable M n = n — m(SV+i + 1) + 1. If M n > i then 

n-t + 1 



a n-t+1 
JN+1 < 1 < 



m 



rn 



Therefore 

P(M n > t) < E 



P(5jv +1 = fc) 



--1 



^P(5j = ife&JV+l = 

fc< n-t±l_ 1 !=1 
m 

k 

fc<B=*+l_i i=l 



= E E p (^ = A; ) F ( r 2 > 

fc<2=i±l_ 1 i=i 

m 

I n ~ t + 1 I _1 

m I 1 



re + 1 



777 



1 - fc) 



fc=l 
I n — 1+1 I 



< 



k=i 



m / 1 / 

E 2exp(-(fc + l 

n + 1 



n + 1 



< 2 exp (r^f 1 

< if r exp 



m 



exp(l/r) 



exp r- 1 L 2 



-1 / | n-t+1 
m 



exp(l/r) — 1 



\mr/ ' 



where the first equality follows from the fact that SV+i > N + 1, the second from the 
definition of N, the third from the fact that T\, T%, . . . are independent and T2, T3, . . . 
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are i.i.d., finally the second inequality from the fact that Si 7^ Sj for i ^ j (see the 
properties (P2) and (P3) of the split chain). 
Let us notice that if t > 2mrlogr, then 

(—) < exp (-~' ' ' ~' 



^ imr) ^ \2mr) ^ V Kmr log r / ' 

where in the last inequality we have used the fact that r > c for some universal constant 
Ol. 

On the other hand, if i < 2mrlogr, then 



> 2mr log t 



1 < e • exp y- t 

Therefore we obtain for t > 0, 

P(M n > t) < ^exp f-r^ ), 

\KrriT logr/ 

which proves the first part of the Lemma. Now, 

n 

f( X i) > t )< F ( M n > t/a) < if exp f 



Karnr log r 



i=(S N+1 +l)m 

□ 

Before we proceed with the proof of Theorem [6] and Theorem [7J we would like 
to make some additional comments regarding our approach. As already mentioned, 
thanks to the property (P3) of the split chain, we may apply to Z\ + ... + the 
inequalities for sums of independent random variables obtained in Section [2] (since for 
m > 1 we have only one-dependence, we will split the sum, treating even and odd 
indices separately). The number of summands is random, but clearly not larger than 
n. Since the variables Zi are equidistributed, we can reduce this random sum to a 
deterministic one by applying the following maximal inequality by Montgomery-Smith 



Lemma 4. Let Y\, . . . ,Y n be i.i.d. Banach space valued random variables. Then for 
some universal constant K and every t > ; 



max 

k<n I 



J^yJI >t) <#ip(||^y f || >t/K). 



Remark The use of regeneration methods makes our proof similar to the proof of 
the CLT for Markov chains. In this context, the above lemma can be viewed as a 
counterpart of the Anscombe theorem (they are quite different statements but both are 
used to handle the random number of summands). 



18 



One could now apply Lemma 0] directly, using the fact that N < n. Then however 
one would not get the asymptotic variance in the exponent (see remark after Theorem 
[7J) . The form of this variance is a consequence of the aforementioned Anscombe theorem 
and the fact that by the LLN we have (denoting N = N n to stress the dependence on 
n) 

N n 1 

lim — = — — a.s. (20) 
n-»oo n 771ET2 

Therefore to obtain an inequality which at least up to universal constants (and for 
m = 1) reflects the limiting behaviour of the variable Z, we will need a quantitative 
version of (I20p given in the following lemma. 

Lemma 5. 7/ HTiH^, I^H^ < r, then 

1 nET 2 s 



F(N > [3n/(mET 2 )\) < K exp , 

V K mr 1 

To prove the above estimate, we will use the classical Bernstein's inequality (actually 
its version for ifi\ variables). 

Lemma 6 (Bernstein's tp\ inequality, see [25], Lemma 2.2.11 and the subsequent re- 
mark). IfYi, . . . ,Y n are independent random variables such that EYJ = and < 
t, then for every t > 0, 



>*) <2exp(-- 1 ' 1 ' 

i=l 

Proof of Lemma\£& Assume now that n/(mKT2) > 1. We have 



1 ./ft 
— mm — j, - 



\N > L3n/(mET 2 )J) < P(m(T 2 + . . . + T L3n/(mET2)J+1 ) < n) 

L3n/(mET 2 )J+l 

< p( ( T » - ET 2) ^ n / m ~ L3n/(mET 2 )jET 2 ] 

i=2 

L3n/(mET 2 )J+l 

<p( Y (Tj — ET 2 ) < n/m — "in/ (2m) 
i=2 

L3n/(mET 2 )J+l 

^ (Ti-ET 2 ) < -n/(2m)). 

i=2 

We have 1 1 T2 — ET2 [1^ < 2||T2||^, 1 < 2r, therefore Bernstein's inequality (Lemma [6]), 
gives 

(1 / in I ^mS 2 n \\ 

--min ) / — L- 2 ,— )) 

( 1 /raET 2 n 
< 2 exp ( — — mm 

= 2 exp 



1 nET 2 
K mr 2 
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where the equality follows from the fact that ET 2 < r. If n/(mET 2 ) < 1, then also 
nET 2 /(mT 2 ) < 1, thus finally we have 

F(N > L3n/(mET 2 )J) < A exp ' ' " E ' / - 



K rriT 2 J ' 

which proves the lemma. □ 

We are now in position to prove Theorem O 

Proof of Theorem® Let us notice that |Zj| < amTi + i, so for i > 1, H^IUj < a^H^H^ < 
amT. Additionally, by (I16|) . for i > 1, EZj = 0. Denote now R = [3n/(mET 2 )\ . Lemma 
[5l Lemma U] and Theorem U] (with a = 1, combined with Pisier's estimate (|13p ) give 



<P (|Zi + ... + Zjv| >2tkN <R + if exp 



Z! + ... + Z N \ > 2t) (21) 

1 nET 2 ' 
if mr 2 

<p(|z 1 + z 3 + ... + z 2L{JV _ 1)/2J+1 | >t&iV <i?) 

Z 2 + Z 4 + . . . + Z 2LJV/2J \>t&N < R) + A exp ( - -^^J 
<p( max \Z 1 + Z 3 + ... + Z 2k+1 \>t 

\k<l(R-l)/2] 

+ P max Z 2 + . . . + Z 2k \> t )+K exp ' 



.fc<[i?/2j / V A mT z 

<KP(|Zx + Z 3 + . . . + Z 2[{R _ l)m+1 \ > t/K 



+ Ap(|Z 2 + Z 4 + . . . + Z WJ I > t/K) + A exp ( - l^J 

^ / 1 / t 2 t ' 

< A exp I — — mm 



A Vn(?7iEr 2 ) 1 VarZi ' log(3n/(mET 2 ))amr 

/ 1 nET 2 \ 
+ Aexp - — T . 

Combining the above estimate with (|18p , Lemma [2] and Lemma [31 we obtain 

|5„| > 4t) 

1 / t 2 i 



< A exp ( — — min 



A" Vn(mET 2 ) 1 VarZ'i ' log(3n/(mET 2 ))amr 
1 nET 2 \ / -t \ 



( 1 nM,l 2 \ ( —t \ ( -1 

+ Aexp - — r +2exp +Aexp 

V A mr / \2amT/ \KamT 



logr 



For t > na/4, the left hand side of the above inequality is equal to 0, therefore, using 
the fact that ET 2 > 1, r > 1, we obtain dTTJ). □ 



20 



The proof of Theorem [7] is quite similar, however it involves some additional tech- 
nicalities related to the presence of ~EZ in our estimates. 

Proof of Theorem^ Let us first notice that similarly as in the real valued case, we 
have 

P(sup|Z (/)| >t)< P(Ti > t/a) < 2 exp (— 
f W 

moreover Lemma [3] (applied to the function x t— > supj e:r |/(ac)|) gives 



it 

(sup ^ f( X i 



f i=S N+1 +l 



> t) < K exp 



-t 



Kar log r 



One can also see that since we assume that m = 1, the splitting of Z\ + . . . + 
into sums over even and odd indices is not necessary (by the property (P3) of the split 
chain the summands are independent). Using the fact that Lemma[3]is valid for Banach 
space valued variables, we can repeat the argument from the proof of Theorem and 
obtain for R = [3n/ET2j, 



R 



Z>KEsu V \J2 z i(f) 



f 



+ / ) _ l\ exp ( — — min 



t 



< K exp ( — — min 



t 2 



no 1 T 2 a log n 
t 



K Vna 2 ' r 3 (ET 2 )- 1 alogn 
Thus, Theorem [7] will follow if we prove that 

R n 

Esup|^Zi(/) <ifEsup|^/(Xi) +Kt 3 ci/ET 2 



f 



i=l 



f 



(22) 



i=l 



(recall that K may change from line to line). 

From the triangle inequality, the fact that Yi = (Xsi+i, ■ ■ ■ ,Xs i+1 ), i > 1, are i.i.d. 
and Jensen's inequality it follows that 

R rn/(4ET 2 )l 

1 sup | Zi(f) | < 12E sup | Yl Z i (/) 

i=l 



Ln/(4ET 2 )J 

<12Esup| Z i(f) 



f 



+ 12or, 



(23) 



i=i 



where in the last inequality we used the fact that Esupj |2j(/)| < EaTj + i < ar. 

We will split the integral on the right hand side into two parts, depending on the 
size of the variable N. Let us first consider the quantity 



Ln/(4ET 2 )J 

Esup| Yl Z M) 

f 1=1 



L {A r <K(4ET 2 )J} 
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Assume that n/(4ET2) > 1. Then, using Bernstein's inequality, we obtain 

|n/(4ET 2 )J+l 

F(N < |_n/(4ET 2 )j) = p( T i> n ) 

i=l 

Ln/(4ET 2 )J+1 

< P(Ti > n/2) + p( ( T i- ET 2) > n / 2 " K(4ET 2 )JET 2 ) 

i=2 

Ln/(4ET 2 )J+1 



< 2e~ n/2T + p( ^ (Tj - ET 2 ) > n/4) 

i=2 

< 2e-"/^ + 2exp f - Imin < i^^ 2 . 



If (n/4ET 2 ) < 1, the above estimate holds trivially. Therefore 

Ln/(4ET 2 )J |n/(4ET 2 )J 

Esu p| Z i(f) 1 {N<[n/(AET 2 )i} ^ a !{iV< |n/(4ET 2 )J } 

^ i=l i=l 



< an ||T 2 1| 2 VP(iV < K(4ET 2 )J) 

< iforne-"^^ 72 < Kar 3 /ET 2 . (24) 
Now we will bound the remaining part i.e. 



|n/(4ET 2 )J 

Esupl V Zi(f) 

f ' 



i=l 



1 {A r >K(4ET 2 )J}- 



Recall that Yq = (X±, . . . , X^), Y% = (Xsi+i, . . . , Xs i+1 ) for z > 1 and consider a 
filtration (Fi)i>o defined as 

Fi = <T{Y ,...,Yi), 

where we regard the blocks Yi as random variables with values in the disjoint union 
U=i^, with the natural a-field, i.e. the a-field generated by (J^i^®* (recall that B 
denotes our a-field of reference in S). 

Let us further notice that Tj is measurable with respect to a{Yi-i) for i > 1. We 
have for i > 1, 

{AT + 1 < i] = {Ti + . . . + T i+ i > n} G ^ 
and {iV + 1 < 0} = 0, so iV + 1 is a stopping time with respect to the filtration T, L . 
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Thus we have 



/ 



|n/(4ET 2 )J 

E SUp | ^ Z i(f) A ! V _// I ISJ-. \ } 

i=l 

Ln/(4ET 2 )jA(Af+l) 

Esup 

/ 



< E E 



i=i 

N+l 

SUP | E Z i(f) 
f i=l 
AT+1 



-{AT+l>Ln/(4ET 2 )J} 



1^1 



|n/(4ET 2 )jA(7V+l) 



L {7V+l>Ln/(4ET 2 )J} 



Esup E W) 

f i=l 



L {iV+l>Ln/(4ET 2 )J} 



N+l 

< ar + Esup | ^ 

f i=0 



L {Af+l>K(4ET 2 )J} 



S'jv+ 2 



<ar + Esup +Esup V /pQ 

/• I * — ' j- I ^ — ' 

J i=l ■< i=n+l 

where in the first inequality we used Doob's optional sampling theorem together with 
the fact that supj | X^=i z i{f)\ i s a submartingale with respect to (Ti) (notice that 
Zi(f) is measurable with respect to u{Yi) for i G N and f £ J 7 ). The second equality 
follows from the fact that {N + 1 > |_ n /(4ET 2 )J} G J 7 in/(4ET 2 )\A(N+i) ■ Indeed for 
i > [n/(4ET 2 )J, we have 

{N + 1 > K(4ET 2 )J & [n/(4ET 2 )\ A (N + 1) < i} 
= {N + 1 > Ln/(4ET 2 )J} G ^ Ln/ ( 4 ET 2 )j C ^, 

whereas for i < [n,/(4ET 2 )J, this set is empty. 



Now, combining the above estimate with (I23p and (124ft and taking into account the 
inequality r > ET 2 > 1, it is easy to see that to finish the proof of ([22]) it is enough to 
show that 



Sn+2 

Esup I V fiX,) <Kar 3 /ET 2 
f 1 



(25) 



i=n+l 



This in turn will follow if we prove that E(Sjv+2 — n) < i^T 3 /ET 2 . 

Recall now the first part of Lemma stating under our assumptions (m =1) that 
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P(n - S N+1 >t)< Kexp(-t/KTlogT) for t > 0. We have 

F(S N+2 -n>t) 

<P(n - S N+1 >t) + P(n - £V+i < t & Sat +2 - n > t kN > 0) 
+ F(S N+2 - n > t & N = 0) 

L* J Art 

^^g-t/Kriogr + £ p(5jv +1 = n - A;&Tjv +2 > t + jfc &iV > 0) 

fc=0 

+ P(Ti + T 2 > t) 

\t\/\n n -k 

<Ke- t/KrloST F ( Sl = n ~ k &T m >t + k) + 2e~ t/2T 

k=0 1=2 
lAl n~k 

k=0 1=2 

<Ke- t/KrlogT + 2{t + l) e - i/r < Ke~ t/KTlogT . 

This implies that E(S N+2 - n) < Krlogr < Kt 3 /ET 2 , which proves ([25]). Thus 
(|22p is shown and Theorem [7] follows. □ 

Remark Two natural questions to ask in regard to Theorem [7] is first whether the 
constant K in front of the expectation can be reduced to 1+rj (as in Massart's Theorem 
[2] or Theorem [3]) and second, whether one can reduce the constant K in the Gaussian 
part to 2(1 + 5) (as in Theorem 

3.3 Another counterexample 

If we do not pay attention to constants, the main difference between inequalities pre- 
sented in the previous section and the classical Bernstein's inequality for sums of i.i.d. 
bounded variables is the presence of the additional factor logn. We would now like 
to argue that under the assumptions of Theorems El and [7J this additional factor is 
indispensable. 

To be more precise, we will construct a Markov chain on a countable state space, 
satisfying the assumptions of Theorem [6] with m = 1 and such that for (3 < 1 , there is 
no constant K, such that 

P(|/(JT0 + . . . + /(X„)| > ,) < A'exp ( - 1 min ( nV ^ (f )y ^)) (26) 

for all n and all functions /: S — > R, with ||/||oo < 1 an d E n f = 0. 
The state space of the chain will be the set 

oo 

5 = {0}u[J({n} x {1,2,. ...n} x {+1,-1}). 

n=l 
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The transition probabilities are as follows 

P(n,k,s),(n,k+i,s) = 1 for n = l,2,..., k = l,2,...,n-l, s = -l,+l 
P(n,n,s),0 = 1 for n = 1, 2, . . . , s = -1, +1, 

Po,(n,i,s) = 2A e ~ n for n = l,2,..., a = -l,+l, 



where A = X^nLi e_n - I* 1 other words, whenever a "particle" is at 0, it chooses one of 
countably many loops and travels deterministically along it until the next return to 0. 
It is easy to check that this chain has a stationary distribution tt, given by 

A 

1 -n 

K(n,i,s) = ^0- 

The chain satisfies the minorization condition (|14p with C = {0}, v({x}) = po tX , 
S = 1 and m = 1. The random variable Ti is now just the time of the first visit to 
and T2,Ts,... indicate the time between consecutive visits to 0. Moreover 

e~ n 

P(T 2 = n) = — , 

so HT2II1/J1 ^ 00 • ^ we st ar t the chain from initial distribution v, then T\ has the same 
law as T 2 , so r = H^H^ = 1 1 Ti 1 1 ^ • 

Let us now assume that for some (3 < 1, there is a constant K, such that (|26|) 
holds. Since we work with a fixed chain, in what follows we will use the letter K also to 
denote constants depending on our chain (the value of K may again differ at different 
occurrences). 

We can in particular apply ()26[) to the function / = f r (where r is a large integer), 
given by the formula 

/(0) = 0, f((n,i,s)) = sl {n > r} . 
We have E„-/ r = 0. Moreover 

00 

Var(Zi(/ r )) = ^nV"i- 1 < Kr 2 e~ r . 

n=r 

Therefore (I26p gives 

P(|/ r (*i) + . . . + f r (X n )\ > K(re- r / 2 Vrt + tlogPn)) < e~ l (27) 
for t > 1 and n £ N. 
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Recall that S{ = T% + . . . + Tj. By Bernstein's inequality (Lemma [f)J) , we have for 
large n, 

^(S^/^et^ >n)= P(Ti + . . . + T rn/(3ET2)1 > n) 

[n/(3ET 2 )l 

(Ti - ETj) > n - rn/(3ET 2 )lET 2 

8=1 

rn/(3ET 2 )l 

<P( ^ (T i -ET 4 )>n/2 



1 ( n n 
< 2 exp ( - — mm ( ^ ||TTJ|2 , 



2e- n ' K . 



Prom the above estimate, for some integer L and n large enough, divisible by L, 



n/L 



Y Z i(fr) > 2K(re- r/2 Vrt + tlog^n)) 
i=0 

n/L 

<2e- n/x +p(|^^(/ r )| > 2K{re- r/2 ^t + t\og l3 n) kS n/L+1 < 

i=0 
n 

<2e- n/x + P(] J2f r (Xi) > K(re- r/2 V^i + t log 13 n) kS n/L+1 < 

i=0 

n 

fr(Xi) >K(re- r / 2 V^ + tlogPn)kS n/L+1 <i 



n 



i=Sn/L+l+l 

<2e- n/K + e - * 

n 

+ E P (! E /rM|>^e" r/2 v^* + tlog^n)&5 n/L+ i 

k<n i=S n / L+1 +l 

=2e~ n/K + e _t 

n—k 



k<n i=l 



fr( X i) > K(re- r / 2 Vrt + t log 



< 2e -^ + e -* + e -* £ El {Sn/i+i=fc} < 2 e -"/ K + 2c" 4 , 



k<n 



where in the third and fourth inequality we used (|27|) and in the equality, the Markov 
property. 

For n — r~ 2 e r and i > 1, we obtain 



¥[\Z (f r ) + ... + Z n/L (f r )\ >Kt\og p n) <2e~ l + 2e 



-n/K 



(28) 



2G 



On the other hand we have 

P(Wr)| > r) > ^e- r . 

Therefore P(maxj< n /k \Zi(f r )\ > r) > 2 _1 mm(ne~ r /(2AL), 1). Since Zi(f r ) are sym- 
metric, by Levy's inequality, we get 

2P(|Z (/ r ) + • • • + Z n/L (f r )\ >r)>\ mm(ne- r /(2AL), 1) > ± 

whereas (I28p applied for t = K~ 1 r / log^ n > K~ l r l ~^ > 1 gives 

p(|Z (/ r ) + . . . + Z n/L (f r )\ > r) < 2e"'- 1 ^/^ + 2e^ r 

which gives a contradiction. 

3.4 A bounded difference type inequality for symmetric functions 

Now we will present an inequality for more general statistics of the chain. Under the 
same assumptions on the chain as above (with an additional restriction that m = 1), 
we will prove a version of the bounded difference inequality for symmetric functions 
(see e.g. [IT] for the classical i.i.d. case). 

Let us consider a measurable function / : S n — * R which is invariant under permu- 
tations of arguments i.e. 

f(x 1 ,...,x n ) = f(x <7l ,...,x <7n ) (29) 

for all permutations a of the set {1, . . . , re}. 

Let us also assume that / is L-Lipschitz with respect to the Hamming distance, i.e. 

\f(xi,...,x n ) - f(yi,...,y n )\ < L#{i: Xi y^yi}. (30) 

Then we have the following 

Theorem 8. Let X\,X%, ... be a Markov chain with values in S, satisfying the Mi- 
norization condition with m = 1 and admitting a (unique) stationary distribution ir. 
Assume also that \\Ti\\ t p 1 , [II^IUj < t. Then for every function f : S n — > R, satisfying 
i flPj) and we have 

F(\f(X u ...,X n )-Ef(X u ...,X n )\ >t)<2exp(--i^j) 
for all t > 0. 

To prove the above theorem, we will need the following 
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Lemma 7. Let cp: R — > R 6e a convex function and G = f(Yi, . . . , Y n ), where Yi,...,Y n 
are independent random variables with values in a measurable space £ and f : £ n — ► R 
is a measurable function. Denote 

Gi = f(Yi, . . . , Yi-i, Yi, li+i, • • • , i^,), 
where (Yy, . . . , Y n ) is an independent copy of (Y]_, . . . , 1^). Assume moreover that 

\G-d\ <Fi{XuYi) 
for some functions Fi : £ 2 — > R, i = 1, . . . , re. Then 

n 

Ep(G - EG) < Ep(^ EiFi(Yi,Yi)), (31) 
i=i 

where ei,...,e n is a sequence of independent Rademacher variables, independent of 
(Y^ =1 and (Yi)? =1 - 

Proof. Induction with respect to re. For re = the statement is obvious, since both the 
left-hand and the right-hand side of (|3ip equal (p(0). Let us therefore assume that the 
lemma is true for re — 1. Then, denoting by Ex integration with respect to the variable 
X, 

Eip(G — EG) = Eip(G - Ey n G n + E Yn G - EG) 

< E<p(G -G n + E Yn G - EG) = E^(G n -G + E Yn G - EG) 
= Eip(s n \G - G n \ + E Yn G - EG) 

< Ep(e n F n (Y n ,Y n ) +E Yn G - EG), 

where the equalities follow from the symmetry and the last inequality from the con- 
traction principle (or simply convexity of if), applied conditionally on (li)j, (Y)i- Now, 
denoting Z = Ey n G, Zi = Ey n Gj, we have for i = 1, . . . , re — 1, 

\Z-Zi\ = \E Yn G-E Yn Gi\ <E Yn \G-Gi\ < F^Y^Yi), 

and thus for fixed Y n ,Y n and e n , we can apply the induction assumption to the function 
1 1 ^ (p(e n F(Y n , Y n ) + t) instead of ip and Ey n G instead of G, to obtain 

E^(G - EG) < Ep (f^FiiYi^e^j . 

□ 

Lemma 8. In the setting of Lemma \7\ if for all i, 11^(1^,1^)11^ < r, then for all 
t > 0, 

F(\f(Y 1 , ...,Y n )- Ef{Y u . . . , Y n )\ > t) < 2exp ( - 1 min (-^, . 
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Proof. For p > 1, 

n 

\\f(Y l ,...,Y n )-Ef(Y 1 ,...,Y n )\\ p <\\Y j e t F(Y i ,Y l ) < K (^^t + pr) , 



i=l 

where the first inequality follows from Lemma [7J and the second one from Bernstein's 
inequality (Lemma [6]) and integration by parts. Now, by the Chebyshev inequality we 
get _ 

P(|/(y l5 . . . ,Y n ) - Ef(Y u . . . ,Y n )\ > K(y/tR + t)r) < e~* 

for t > 1, which is up to the constant in the exponent equivalent to the statement of 
the lemma (note that if we can change the constant in the exponent, the choice of the 
constant in front of the exponent is arbitrary, provided it is bigger than 1). □ 

Proof of Theorem^ Consider a disjoint union 

oo 

£={JS* 

i=i 

and a function / : £ n — > M. defined as 

f(yi, ...,y n ) = f(xx, . . .,x n ), 

where Xj's are defined by the condition 

?/l = (xi, . . -,x tl ) G 5* 1 

V2 = (xti+U ■ ■ -,x tl+ t 2 ) G S t2 

y n = (x tl +...+t n _ 1 +i,...,xt 1 +...+t n ) G<SV (32) 

Let now T\, . . . ,T n be the regeneration times of the chain and set 

Yi = (X Tl+ ,,, +Ti _ 1+ i, . . . ,X Tl +...+Ti) 

for i = 1, . . . , n (we change the enumeration with respect to previous sections, but there 
is no longer need to distinguish the initial block). Then Y\, . . . , Y n are independent E- 
valued random variables (recall the assumption m = 1). Moreover we have 

f(X 1 ,...,X n )=f(Y l ,...,Y n ). 

Let now Yi, . . . , Y n be an independent copy of the sequence Y±, . . . ,Y n . Define G and 
Gi like in Lemma [7] (for the function /). Define also = j iff Yi G S J and let 

Xi,i, X ijTi+ _ +T ._ i+ f, +T . +i+ _ +Tn correspond to Yi, . . . , Yj_i, Y u Y i+1 , . . . , Y n in the 

same way as in (|32p . Let us notice that we can rearrange the sequence (Xn, . . . , Xi n ) in 
such a way that the Hamming distance of the new sequence from (X\, . . . , X n ) will not 
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exceed max(Tj,Tj). Since the function / is invariant under permutation of arguments 
and L-Lipschitz with respect to the Hamming distance, we have 

\G-Gi\ <LmaK(r f ,fi) =: F(Y u Yi). 

Moreover, 11^(1^,1^)11^ < 2Lt, so by LemmaEl we obtain 

F(\f(X 1 ,...,X n )-Ef(X 1 ,...,X n )\>t) 

= n\f(Y h ...,Y n )- E/(y l5 . . . , Y n )\ > t) < 2exp ( - 1 min , J_) ) . 

But from Jensen's inequality and (|30f) it follows that |/(Xi, . . . , X n )— E/(Xi, . . . , X n )\ < 
Ln, thus for t > Ln, the left hand side of the above inequality is equal to 0, whereas 
for t < Ln, the inequality r > 1 gives 

t 2 t 



< 



nL 2 T 2 Lt ' 

which proves the theorem. □ 
3.5 A few words on connections with other results 

First we would like to comment on the assumptions of our main theorems, concerning 
Markov chains. We assume that the Orlicz norms ||Ti||^ 1 and ||T2||^ 1 are finite, which 
is equivalent to existence of a number k > 1, such that 

E^k Ti < oo, E„K Tl < 00, 

where £ is the initial distribution of the chain and v - the minorizing measure from 
condition (I14h . This is true for instance if m = 1 and the chain satisfies the drift 
condition, i.e. if there is a measurable function V: S — ► [1, oo), together with constants 
A < 1 and K < oo, such that 



PV{x) = [ V(y)P(x,dy) < 
Js 



XV(x) for x $ C, 
K for x £ C 



and V is £ and v integrable (see e.g. pQ, Propositions 4.1 and 4.4, see also [22], [IT]). 
For m > 1 one can similarly consider the kernel P m instead of P (however in this case 
our inequalities are restricted to averages of real valued functions as in Theorem [6]) . 
Such drift conditions have gained considerable attention in the Markov Chain Monte 
Carlo theory as they imply geometric ergodicity of the chain. 

Concentration of measure inequalities for general functions of Markov chains were 
investigated by Marton [T3], Samson and more recently by Kontorovich and Ra- 
manan [8]. They actually consider more general mixing processes and give estimates 
on the deviation of a random variable from the mean or median in terms of mixing 
coefficients. When specialized to Markov chains, their estimates yield inequalities in 
the spirit of Theorem [8] for general (non-necessarily symmetric) functions of uniformly 
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ergodic Markov chains (see |17j . Chapter 16 for the definition). To obtain their results, 
Marton and Samson used transportation inequalities, whereas Kontorovich's and Ra- 
manan's approach was based on martingales. In all cases the bounds include sums of 
expressions of the form 

sup ||F i (s,-)-.P i (y,-)l|TV, 

x,y£S 

where P % is the i step transition function of the chain. These results are not well suited 
for Markov chains which are not uniformly ergodic (like the chain in Section r3.3|) . since 
for such chains the summands are bounded from below by a constant (which spoils 
the dependence on n in the estimates). It would be interesting to know if in results of 
this type, the supremum of the total variation distances can be replaced by some other 
norm, for instance a kind of average. This would allow to extend the estimates to some 
classes of non-uniformly ergodic Markov chains. 

Inequalities of the bounded difference type for sums f(X\) + . . . + f(X n ) where AVs 
form a uniformly ergodic Markov chain were also obtained by Glyn and Ormoneit [6]. 
Their method was to analyze the Poisson equation associated with the chain. Their 
result has been complemented by an information theoretic approach in Kontoyiannis 
et al. 0. 

Estimates for sums, in terms of variance, appeared in the work by Samson [23J, who 
presents a result for empirical processes of uniformly ergodic chains. He gives a real con- 
centration inequality around the mean (and not just a tail bound as in Theorem[7]). The 
coefficient responsible for the subgaussian behavior of the tail is E Y17=i SU P/ fi-X-i) 2 . 
Replacing it with V = Esupj ^ f{Xi) 2 (which would correspond to the original Tala- 
grand's inequality) is stated in Samson's work as an open problem, which to our best 
knowledge has not been yet solved. Additionally, in Samson's estimate there is no log n 
factor, which is present in Theorems [6] and [7J Since we have shown that in our setting 
this factor is indispensable, we would like to comment on the differences between the 
results by Samson and ours. 

Obviously, the first difference is the setting. Although non-uniformly ergodic chains 
satisfy our assumptions HTiH^, H^H^ < do, the Minorization condition may not hold 
for them with m = 1, which restricts our results to linear statistics of the chain (Theo- 
rem However, there are many examples of non-uniformly ergodic chains, for which 
one cannot apply Samson's result but which satisfy our assumptions. Such chains have 
been considered in the MCMC theory. 

When specialized to sums of real variables, Samson's result can be considered a 
counterpart of the Bernstein inequality, valid for uniformly ergodic Markov chains. 
The subgaussian part of the estimate is controlled by Yli=i^f(Xi) 2 j which can be 
much bigger than the asymptotic variance and therefore does not reflect the limiting 
behaviour of f(X\) + . . . + f(X n ). Consider for instance a chain consisting of the origin 
connected with finitely many loops in which, similarly as in the example from Section 
13.31 the randomness appears only at the origin (i.e. after the choice of the loop the 
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particle travels along it deterministically until the next return to the origin). Then, 
one can easily construct a function / with values in {±1}, centered with respect to the 
stationary distribution and such that its asymptotic variance is equal to zero, whereas 
Sr=i ^f(Xi) 2 = n for all n (it happens for instance if the sum of the values of / along 
each loop vanishes). In consequence, n~ l / 2 (f(Xi) + . . . + f(X n )) converges weakly to 
the Dirac mass at and we have 

F(\f(X 1 ) + ... + f(X n )\ > V£t)-0 

for all t > 0, which is not recovered by Samson's estimate. One can also construct 
other examples of similar flavour, in which the asymptotic variance is nonzero but is 
still much smaller than E^iLi f{Xi) 2 - 

On the other hand Samson's results do not require the condition E^/ = and 
(as already mentioned) in the case of empirical processes they provide a two sided 
concentration around the mean. 

As for the logn factor, at present we do not know if at the cost of replacing the 
asymptotic variance with X/ILiE/(Xj) 2 one can eliminate it in our setting. 

Summarizing, our inequalities, when compared to known results have both advan- 
tages and disadvantages. On the one hand, when specialized to uniformly ergodic 
Markov chains, they do not recover the full generality or strength of previous estimates 
(for instance Theorem [H] is restricted to symmetric statistics and m = 1), on the other 
hand they may be applied to Markov chains arising in statistical applications, which 
are not uniformly ergodic (and therefore beyond the scope of the estimates presented 
above). Another property, which in our opinion, makes the estimates of Theorems [6] 
and [7] interesting (at least from the theoretical point of view) is the fact that for m = 1, 
the coefficient responsible for the Gaussian level of concentration corresponds to the 
variance of the limiting Gaussian distribution. 

Acknowledgements The author would like to thank Witold Bednorz and Krzysztof 
Latuszyhski for their useful comments concerning the results presented in this article 
as well as the anonymous Referee, whose remarks helped improve their presentation. 

References 

[1] Baxendale P. H. Renewal theory and computable convergence rates for geomet- 
rically ergodic Markov chains. Ann. Appl. Probab. 15 (2005), no. IB, 700-738. 
MR2114987. 

[2] Bousquet O. A Bennett concentration inequality and its application to suprema 
of empirical processes. C. R. Math. Acad. Sci. Paris 334 (2002), no. 6, 495-500. 
MR1890640. 



32 



[3] Bousquet O., Boucheron S., LuGOSi G., Massart P. Moment inequalities for 
functions of independent random variables. Ann. Probab 33 (2005), no. 2, 514-560. 
MR2123200. 

[4] Einmahl U., Li D. Characterization of LIL behavior in Banach space. To appear 
in Trans. Am. Math. Soc. 

[5] Gine E., Latala R., Zinn J. Exponential and moment inequalities for U- 
statistics. In High Dimensional Probability II, 13-38. Progr. Probab. 47. Birkhauser, 
Boston, Boston, MA, 2000. MR1857312. 

[6] Glynn P. W., Ormoneit D. Hoeffding's inequality for uniformly ergodic Markov 
chains. Statist. Probab. Lett. 56 (2002), no. 2, 143-146. MR1881167. 

[7] Klein T., Rio, E. Concentration around the mean for maxima of empirical pro- 
cesses. Ann. Probab. 33 (2005), no. 3, 1060-1077. MR2135312. 

[8] Kontorovich L., Ramanan K. Concentration Inequalities for Dependent Ran- 
dom Variables via the Martingale Method. To appear in Ann. Probab. 

[9] Kontoyiannis I., Lastras-Montano L., Meyn S. P. Relative Entropy and 
Exponential Deviation Bounds for General Markov Chains. 2005 IEEE International 
Symposium on Information Theory. 

[10] Ledoux M. On Talagrand's deviation inequalities for product measures. ESAIM: 
Probability and Statistics, 1(1996), 63-87. MR1399224. 

[11] Ledoux M. The concentration of measure phenomenon. Mathematical Sur- 
veys and Monographs, 89. American Mathematical Society, Providence, RI, 2001. 
MR1849347. 

[12] Ledoux M., Talagrand M. Probability in Banach spaces. Isoperimetry and 
processes. Ergebnisse der Mathematik und ihrer Grenzgebiete (3), 23. Springer- 
Verlag, Berlin, 1991. MR1102015. 

[13] MARTON K. A measure concentration inequality for contracting Markov chains. 
Geom. Funct. Anal. 6 (1996), no. 3, 556-571. MR1392329. 

[14] Marton K. Erratum to: "A measure concentration inequality for contracting 
Markov chains". Geom. Funct. Anal. 6 (1996), no. 3, 556-571. MR1466340. 

[15] Marton, K. Measure concentration for a class of random processes. Probab. 
Theory Related Fields 110 (1998), no. 3, 427-439. MR1616492. 

[16] Massart, P. About the constants in Talagrand's concentration inequalities for 
empirical processes. Ann. Probab. 28 (2000), no. 2, 863-884. MR1782276. 



33 



[17] Meyn, S. P., Tweedie, R. L. Markov chains and stochastic stability. Commu- 
nications and Control Engineering Series. Springer- Verlag London, Ltd., London, 
1993. MR1287609. 

[18] Montgomery-Smith S.J. Comparison of sums of independent identically dis- 
tributed random vectors. Probab. Math. Statist. 14 (1993), no. 2, 281-285. 
MR1321767. 

[19] Tomczak-Jaegermann N., Mendelson S. A subgaussian embedding theorem. 
To appear in Israel J. Math. 

[20] Panchenko D. Symmetrization approach to concentration inequalities for empir- 
ical processes. Ann. Probab. 31 (2003), no. 4, 2068-2081. MR2016612. 

[21] Pisier, G., Some applications of the metric entropy condition to harmonic anal- 
ysis. Banach spaces, harmonic analysis, and probability theory, 123-154, Lecture 
Notes in Math., 995, Springer, Berlin, 1983. MR0717231. 

[22] Roberts, G. O., Rosenthal, J. S. General state space Markov chains and 
MCMC algorithms. Probab. Surv. 1 (2004), 20-71. MR2095565. 

[23] Samson, P.M. Concentration of measure inequalities for Markov chains and 3>- 
mixing processes. Ann. Probab. 28 (2000), no. 1, 416-461. MR1756011. 

[24] Talagrand M. New concentration inequalities in product spaces. Invent. Math. 
126 (1996), no. 3, 505-563. MR1419006. 

[25] van der Vaart, Aad W., Wellner, Jon A. Weak convergence and empirical 
processes. With applications to statistics. Springer Series in Statistics. Springer- 
Verlag, New York, 1996. MR1385671 



34 



